Learning Recurring Concepts from Data Streams in Ubiquitous Environments

نویسنده

  • João Paulo Bártolo Gomes
چکیده

Due to recent scientific and technological advances in information systems it is now possible to continuously record data at high speeds in a wide range of devices. The need to make sense of such massive amounts of data opens an opportunity to create new data stream classification techniques to model and predict the behavior of streaming data. When learning from data streams, the problem of concept drift means that the underlying data distributions can change over time. This has a strong impact on classification techniques, as predictive models become invalid and have to be updated. Furthermore, these changes in concept are usually a consequence of changes in context, and this relationship could be exploited to handle concept drift. Recurring concepts is a particular case of concept drift, where concepts that have drifted can suddenly reoccur. In this situation it may be possible to avoid relearning these previously observed concepts. However, the few existing approaches that take advantage of concept recurrence are neither designed to take context into consideration nor to take into account the resources required to store representations of past concepts. Both issues are of particular significance for ubiquitous data stream mining, where the learning process is executed in dynamically changing environments using resource constrained devices. Moreover, most existing techniques assume that the underlying data stream feature space is static. However, in many real-world applications the set of features and their relevance to the target concept may change over time. Despite its importance, this issue has received little attention, particularly on how it can be efficiently addressed when tracking recurring concepts. Sharing knowledge among ubiquitous devices to collaboratively improve the modeling of local concepts is another interesting idea which has not been properly explored. This could improve the accuracy of the local model as it would benefit from patterns similar to the local concept that were observed in other ubiquitous devices, but not yet locally. In addition, the deployment of data stream classification as an autonomous and adaptive service to support the data analysis requirements of ubiquitous applications is still an open issue that lacks research in the field of ubiquitous data stream mining. This PhD thesis addresses the aforementioned open issues, focusing on learning anytime, anywhere classification models from data streams in ubiquitous environments, where the underlying concepts may change over time, with special emphasis on recurring concepts. Four main contributions are presented: • The MReC (Mining Recurring Concepts) approach that integrates context with previously learned concepts to improve the adaptation to recurring concepts. Moreover, to deal with situations of resource constraints, an intelligent strategy to discard models is also proposed. • The MReC-DFS (Mining Recurring Concepts in a Dynamic Feature Space) approach, that extends MReC to address the challenges of a dynamic feature space while simultaneously reducing the memory cost of storing past models. In addition, a novel incremental feature selection method is proposed that dynamically determines the threshold used to select the most relevant features for a certain concept. • A Collaborative Data Stream Mining (Coll-Stream) approach that explores the knowledge available in the community to improve local classification accuracy. Coll-Stream integrates community knowledge using an ensemble method where the classifiers are selected and weighted based on their local accuracy for different partitions of the instance space. • A UDSM (Ubiquitous Data Stream Mining) Service to support the data analysis requirements of ubiquitous applications. As the basis for our service we describe a general mechanism, which autonomously adapts the execution of the data stream classification process to each situation, using context and resource awareness. Finally, the experimental validation of the proposed contributions using synthetic and real datasets allows us to achieve the objectives and answer the research questions proposed for this dissertation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New History Based Method to Handle the Recurring Concept Shifts in Data Streams

Recent developments in storage technology and networking architectures have made it possible for broad areas of applications to rely on data streams for quick response and accurate decision making. Data streams are generated from events of real world so existence of associations, which are among the occurrence of these events in real world, among concepts of data streams is logical. Extraction ...

متن کامل

New Management Operations on Classifiers Pool to Track Recurring Concepts

Handling recurring concepts has become of interest as a challenging problem in the field of data stream classification in recent years. One main feature of data streams is that they appear in nonstationary environments. This means that the concept which the data are drawn from, changes over the time. If after a long enough time, the concept reverts to one of the previous concepts, it is said th...

متن کامل

Using a Classifier Pool in Accuracy Based Tracking of Recurring Concepts in Data Stream Classification

Data streams have some unique properties which make them applicable in precise modeling of many real data mining applications. The most challenging property of data streams is the occurrence of ‘‘concept drift’’. Recurring concepts is a type of concept drift which can be seen in most of real world problems. Detecting recurring concepts makes it possible to exploit previous knowledge obtained in...

متن کامل

Mining Recurrent Concepts in Data Streams Using the Discrete Fourier Transform

In this research we address the problem of capturing recurring concepts in a data stream environment. Recurrence capture enables the re-use of previously learned classifiers without the need for re-learning while providing for better accuracy during the concept recurrence interval. We capture concepts by applying the Discrete Fourier Transform (DFT) to Decision Tree classifiers to obtain highly...

متن کامل

Learning Predictive Generalizations for Multiple Streams: An Incremental Algorithm

We present an approach to learning complex dependencies among multiple streams of time-series data incrementally. Given a set of input streams that contain categorical values that change over time, we characterize recurring structure with a set of dependency rules that can be used to predict stream values in the future. These rules are general in the sense of ignoring noisy values in streams.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010